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Abstract — ^We consider asynchronous point-to-point 
communication. Building on a recently developed model, 
we show that training based schemes, i.e., communication 
strategies that separate synchronization from information 
transmission, perform suboptimally at high rate. 

Index Terms — detection and isolation; sequential decod- 
ing; synchronization; training-based schemes 



I. Model and Review of Results 

We consider the asynchronous communication set- 
ting developed in [IJ, which provides an extension to 
Shannon's original point-to-point model for synchronous 
communication [2|. 

We recall the setting in [1]. Communication takes 
place over a discrete memoryless channel characterized 
by its finite input and output alphabets X and y, re- 
spectively, and transition probability matrix Q{y\x), for 
all y € y and x S X. There are M > 2 messages 
{1,2, . . . ,M}. For each message m there is an asso- 
ciated codeword (m) = ci{m)c2{m) . . . CN{m), a 
string of length composed of symbols from xQ The 
M codewords form a codebook Ctv- The transmitter 
selects a message m, randomly and uniformly over 
the message set, and starts sending the corresponding 
codeword {m) at a random time v, unknown to the re- 
ceiver, independent of {m), and uniformly distributed 
in {1,2,...,^}. The transmitter and the receiver know 
the integer A>1, which we refer to as the asynchronism 
level between the transmitter and the receiver If ^ = 1 
the channel is said to be synchronized. The capacity of 
the synchronized channel Q is denoted C, or C{Q) when 
necessary for clarity. 

During information transmission the receiver observes 
a noisy version of the sent codeword, while before and 
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'The symbol '=' stands for 'equal by definition.' 



after the information transmission it observes only noise. 
Conditioned on the event {v = k}, k e {1,2, .. . ,A}, 
and on the message m to be conveyed, the receiver 
observes independent symbols Yi , 1^2 , • • • distributed as 
follows. If i e {1,2,..., k -1} ox ie{k + N,k + N + 
1, . . . , ^4 + iV - 1}, the distribution of Y-i is 



Q.(-) = Q(-W 

for some fixed ★ G X. At any time i G {k, k + 1, 
— 1}, the distribution of Yi is 



.,k + 



(5(-|Q_fe+i(m)) . 

It should be emphasized that the transition probability 
matrix Q{-\-), together with the 'no-input' symbol -k, 
characterizes the communication channel. In particular, 
the -k is not a parameter of the transmitter, i.e., the system 
designer cannot designate which symbol in the input 
alphabet is This symbol can, however, be used for 
the codebook design. Throughout the paper, whenever 
we refer to a certain channel Q, we implicitly assume 
that the -k symbol is given. 

The decoder consists of a sequential test {tn,4'n), 
where r^v is a stopping time — bounded by ^ + — 1 
— with respect to the output sequence Yi,Y2,. . . indi- 
cating when decoding happens, and where (pN denotes a 
decision rule that declares the decoded message. Recall 
that a stopping time r (deterministic or randomized) 
is an integer-valued random variable with respect to a 
sequence of random variables {Yi}'?^^ so that the event 
{r = n}, conditioned on the realizations of {Yi}"^^^, 
is independent of those of {l^i}^„+i, for all n > 1. 
The function </>Ar is then defined as any 5"^^^^ -measurable 
map taking values in {1,2,..., M}, where iFi, 9^2, • • • 
is the natural filtration induced by the output process 
Yi,Y2,.... 

We are interested in reliable and quick decoding. 
To that aim we first define the average decoding error 
probability (given a codebook and a decoder) as 

MA 
m=l k=l 

where £ indicates the event that the decoded message 
does not correspond to the sent message, and where the 
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subscripts indicate the conditioning on the event 

that message m starts being sent at time k. 

Second, we define the average communication rate 
with respect to the average delay it takes the receiver 
to react to a sent message, i.e|l 

where E,{tn — i^)^ is defined as 

MA 

m=l k=l 

where Km,k denotes the expectation with respect to 
Pm,fc> and where denotes max{0, x}. With the above 
definitions, we now recall the notions of {R, a) coding 
scheme and capacity function. 

Definition 1 ({R, a) coding scheme). Given a channel 
Q, a pair {R, a) is achievable if there exists a sequence 
{(Cat, (tat, (/)jv)}Ar>i of codebook/decoder pairs that 
asymptotically achieves a rate R at an asynchronism 
exponent a. This means that, for any e > and all 
N large enough, the pair (Cjv, (tat, (/>Ar)) 

• operates under asynchronism level A = e(°~^)^; 

• yields an average rate at least equal to R — e; 

• achieves an average error probability P(£) at most 
equal to e. 

Given a channel Q, an (R, a) coding scheme is a 
sequence {{Cn, iTN,4'N))}N>i that achieves a rate R 
at an asynchronism exponent a as N ^ oo. 

Definition 2 (Capacity of an asynchronous discrete 
memoryless channel). The capacity of an asynchronous 
discrete memoryless channel with (synchronized) capac- 
ity C{Q) is the function 

R^a{R,Q), 

where a{R,Q) is the supremum of the set of asynchro- 
nism exponents that are achievable at rate R. 

It turns out that the exponential scaling of the asyn- 
chronism exponent with respect to the codeword length 
in Definition [T] is natural: asynchronism induces a rate 
loss with respect to the capacity of the synchronous 
channel only when it grows at least exponentially with 
the codeword length HI. 

The following theorem, given in p), provides a non- 
trivial lower bound to the capacity of asynchronous 
channels: 

^In denotes the natural logarithm. 



Theorem 1. For a given channel Q, let a > and let 
P be a distribution over X such that 

minmax{Z)(y||(Pg)y),2)(y||Q,)} > a 

where the minimization is over all distributions over 
y, and where the distribution (PQ)y is defined as 

(PQhiy) = YlxexPi^)Qiy\^)' y Then, the pair 
(R = I{PQ), a) is achievable. 

Corollary 1. At capacity, it is possible to achieve a 
strictly positive asynchronism exponent, except for the 
case when corresponds to the capacity-achieving out- 
put distribution of the synchronous channel^ Moreover, 
the asynchronism exponent achievable at capacity can 
be arbitrarily large, depending on the channel. 

This is in contrast with training-based schemes. The 
contribution of this paper, given in the next section, is 
to show that training-based scheme, in general, achieve 
a vanishing asynchronism exponent in the limit of the 
rate going to capacity. 

II. Training-Based Schemes 

The usual approach to communication is a training- 
based architecture. In such schemes, each codeword is 
composed of two parts. The first part, the sync preamble, 
is a sequence of symbols common to all the codewords, 
hence carries no information; its only purpose is to help 
the decoder to locate the sent message. The second part 
carries information. The decoder operates according to a 
two-step procedure. First it tries to locate the codeword 
by seeking the sync preamble. Once the sync preamble 
is located, it declares a message based on the subsequent 
symbols. A formal definition of a training-based scheme 
follows. 

Definition 3. A training-based scheme is a coding 
scheme {(Cat, (tn, (I)n))}n>i with the following proper- 
ties. For some e > 0, r/ E [0, 1], and all integers N > 1 

i. each codeword in C j\f starts with a string of size 
7]N that is common to all codewords^ 

ii. the decision time is such that the event 
{t]\j = n}, conditioned on the rjN observations 

^To see this, recall that, given the channel Q, all capacity-achieving 
input distributions P induce the same output distribution {PQ)y. 
Whenever (PQ)y differs from Q*, the min-max expression in 
Theorem [T] is strictly positive. Therefore capacity is achievable at 
a strictly positive asynchronism exponent. 

■^To be precise, the string size should be an integer, and instead of 
having it equal to rjN we should have it equal to [r;AfJ. However, 
since we are interested in the asymptotic TV oo, this discrepancy 
typically vanishes. Similar discrepancies are ignored throughout the 
paper. 
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is independent of all other past ob- 
servations, i.e., Y^~^ and y^.jv+jjAr+i.' 
iii. the codebook <^nd the decoding time tn satisfy 

F{tn > k + 2N - 1\tn > k + N ,u = k) > e 

for all A; G {1, 2, . . . , A} . 

Condition i. specifies the size of the sync preamble. 
Condition ii. indicates that the decoding time should 
depend only on the sync preamble. Condition iii. imposes 
that the codeword symbols that follow the sync preamble 
should not be used to help the decoder locate the code- 
word. If we remove Condition iii., one could imagine 
having information symbols with a 'sufficiently biased' 
distribution to help the decoder locate the codeword 
position (the 'information symbols' could even start with 
a second preamble!). In this case the sync preamble is 
followed by a block of information symbols that also 
helps the decoder to locate the sent codeword. To avoid 
this, we impose Condition iii. which says that, once the 
sync preamble is missed (this is captured by the event 
{tn > k + N,v = k], the decoder's decision to stop 
will likely no more depend on the sent codeword since 
it will occur after k + 2N — 1. 

Finally, it can be shown that a large class of training- 
based schemes considered in practice satisfy the above 
three conditions. 

Theorem 2. A training-based scheme that achieves a 
rate R G (0, C(Q)] operates at an asynchronism expo- 
nent a upper bounded as 

a< \1 — —1 maxminmaxjDi , D^l, 
V C J p w 

where Di = D{W\\Q\P), and D2 = D{W\\Q^\P)EThe 
first maximization is over all distributions over X and the 
minimization is over all conditional distributions defined 
over X X y. 

The following result is a consequence of Theorem |2l 

Corollary 2. Unless the no-input symbol -k does not gen- 
erate a particular channel output symbol (i.e., Q{y\*) = 
for some y G ^ j, training-based schemes achieve a 
vanishing asynchronism exponent as R ^ C{Q). 

Proof of Corollary |2} We consider the inequality 
of Theorem |2] and first upper bound the minimization 
by choosing W = Q. With this choice, the inner 

^We use y/ for Y„ . . . ,Yj (for i < j). 

•^We use the standard notation D{W\\Q\P) for the Kullback- 
Leibler distance between the joint distributions P{-)W{-\-) and 
P(-)Q(-|-) (see, e.g., E p. 31]). 



maximization becomes D2 = D{Q\\Qi,\P) (since Di = 
D{Q\\Q\P) = 0). Maximizing over P yields 

maxD{Q\\Q^\P) = maxD{Q{-\x)\\Q^) 

P xgX 

which is bounded when Q{y\*) > for all y G y. 
Therefore the max-min-max term in the inequality of 
Theorem |2] is finite and gets multiplied by a term that 
vanishes as i? ^ C{Q). ■ 

Thus, except for degenerate cases, training-based 
schemes achieve a vanishing asynchronism exponent in 
the limit of the rate going to capacity. In contrast, from 
Theorem [T] one deduces that it is possible, in general, to 
achieve a non-zero asynchronism exponent at capacity, 
as we saw above. 

This suggests that to achieve a high rate under strong 
asynchronism, separating synchronization from informa- 
tion transmission is suboptimal; the codeword symbols 
should all play the dual role of information carriers and 
'information flags.' 

Sketch of Proof of Theorem |2] 

Consider a training-based scheme 

{{Qn,{tn,4'n))}n>i- For simplicity, we assume 
that the sync preamble distribution of Cat is the same, 
equal to P, for all > 1. The case of different 
preamble distributions for different values of requires 
a minor extension. The proof consists in showing that 
if the following two inequalities hold 

r]D{W\\Q\P) < a (1) 
r]D{W\\Q^\P) < a (2) 

for some conditional distribution W, then the average 
reaction delay achieved by {(Cat, (tat, 4'n))}n>i grows 
exponentially with A^. This, in turn, can be shown to 
imply that the rate is asymptotically equal to zero. There- 
fore, maximizing over the sync preamble distributions, 
it is necessary that 

a < rimaxmmmax{D{W\\Q\P),D{W\\Q^\P)} 
p w 

in order to achieve a strictly positive rate R. The second 
part of the proof, omitted in this paper, consists in 
showing that the highest value of rj compatible with rate 
R communication is upper bounded by (1 — R/C{Q)). 
This with the above inequality yields the desired result. 

Below we sketch the argument that shows that, if 
both ([T]) and ^ hold, the average reaction delay grows 
exponentially with A^. 

To keep the presentation simple, in the equations 
below we omit terms that go to zero in the limit A^ ^ 00. 
Thus, although the equations may not be valid as written, 
they become valid in that limit. 
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Let {(Cat, (tat, 4>n))}n>i be a training-based scheme 
with preamble empirical distribution equal to P. By 
property ii., the stopping time rjv is such that the 
event {ttv = n} depends only on the realizations of 
Yn-N+i^ ■ simplicity, instead of rjv, we are going 
to consider the shifted stopping time rjy = TN — {l—r])N 
whose decision to stop at a certain moment depends on 
immediate rjN previously observed symbols. Clearly, rjy 
can be written as 

= mf{i > 1 : = 1}, 

where each Si is some (decision) function defined over 
Yi--riN+i ^^'^ '^bat take on the values or 1. 
The condition iii. in terms of rjy becomes 

P(t]v > k + N + r]N - 1\t'j^ > k + 7]N,u = k) > e 

(3) 

for all k G {1,2,...,^}. 
Let us define the events 

8,2 = {Si = for i £ {ly + N + T]N - 1, ... , 3A/A}} 
83 = {rN>'^ + N + 7]N-l} 
£4 = {i^ < A/A} . 

We lower bound the reaction delay as 

- v)^) > m^N - £4)ip(£i, £4), (4) 

and consider the two terms on the right-side separately. 

We first show that E((rj^ - i/)+|£i, £4) = 17(^)0 We 
have 





+|£i, 


£4) 




't'n 




n£i, 


£2, £3; £4)IP(£2, £3 £i5 £4) 








£3, £4)P(£2, £3 £1, £4) 


t'n 






> 3^/4,1/ < A/4)P(£2,£3|£i,£4 



> ^P(£2,£3|£i,£4) 



(5) 



where the first equality holds since £3 C £1, and 
where the second equality holds since £2 H £3 = 
{t'^ > 3^/4}. We now prove that P(£2|£i,£4) and 
P(£3|£i,£4) have large probabilities for large N. This 
implies that P(£2, £3|£i, £4) has a probability bounded 
away from zero for N large enough. This together with 
© implies that E((rj;^ - z^)+|£i, £4) = Q.{A) as claimed 
above. 

) refers to the standard Landau order notation. 



For P(£2|£i, £4) we have 

P(£2l£i,£4) =P(£2l£4) 

— ^Wu+N+riN-l 



0|z^ < A/A) 



k+N+riN-l ~ ^\'^ ~ ^) 



0) 



> 



^ A/4 

1/4^ 

' k=l 

^ A/4 

' k=l 
A/4 

p.{s; 



0) 



k=l 

A/4 



0) 

(r^ > 3A/4) 



(6) 



where P^ denotes the output distribution under pure 
noise, i.e., when the Yi's are i.i.d. according to Q*. 
For the first equality we used the independence between 
£2 and £1 conditioned on £4. For the fourth equality 
we noted that, conditioned on {u = k}, the event 



■,3A/4 
^k+N+riN 



_iis independent of the sent codeword (prefix 
and information sequence), hence its probability is P^. 

Now, the event {rj^ > 3 A/4} only depends on the 
output symbols up to time 3 A / 4. The probability of this 
event under P^ is thus the same as under the probability 
distribution induced by the sending of a message after 
time 3yl/4. Therefore, since the probability of error 
vanishes for large N, and that a message starts being 
sent after time 3^4/4 with (large) probability 1/4, we 
must have P*(rj^ > 3 A/ 4) ^ 1 for large A^. Hence 
from ^ we have 



|£i, £4 



1 



(V) 



for large A^. Now consider P(£3|£i, £4). Using ([3]), we 
have 

P(£3|£i,£4) >e. (8) 

From (|7]) and ^ we deduce that P(£2, £3|£i, £4) is 
the (conditional) probability of the intersection of two 
large probability events. Therefore P(£2, £3|£i, £4) has 
a probability bounded away from zero as — > 00. 
Hence, we have shown that 



E((r^-i/)+|£i,£4) = 17(A) 

as claimed earlier. 
Second, we prove that 

P(£i,£4) = l^(e-''^^^poly(iV)), 



(9) 



(10) 



where Di = D{W\\Q\P), P denotes the type of the 
preamble, and poly(A^) denotes a quantity that goes to 
at most polynomially quickly as a function of N. 
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We expand P(£i, £4) as 

yl/4 



¥i&i,8,4) = jY.^k{r'^>k + r]N), (11) 

k=l 

where represents the probability distribution of the 
output conditioned on the event {u = k}. Further, by 
picking a conditional distribution W defined over X x y 
such that Fk{Y^^''^-^ G ^(^)) > 00 we lower the 
term in the above sum as 



rN>k + 1]N) >Pfc(T^ > A; + r]N\Y^ 



k+riN-l ryvN 
t J 1 



W 



(12) 

We lower bound each of the two terms on the right-side 
of Gil). 

For the first term, a change of measure argument 
reveals that 

= ¥,{T'^>k + riN\Y^+'^''-^ eTf{P)) . (13) 
To see this, one expands 

Pfc(r^>A: + 7?iV|n*+^^-^Ga;f(P)) 

by further conditioning on individual sequences in 
'J^(P). Then, one uses the fact that, conditioned on a 
particular such sequence, the channel outputs outside the 
time window {k,k + 1, . . . ,k + rjN — 1} are distributed 
according to noise, i.e., i.i.d. according to Q^. 
For the second term we have 

niYk''''''' e 'T'^iP)) > poly{N)e-^^^^ (14) 

using fS^, Lemma 2.6, p. 32], where Di = D{W\\Q\P). 
Combining (dB, CH), (O, and flU we get 



P(£i,£4) 

>poly(7V)- 

A/4 



-rjNDi 

A 



k=l 



>poly(iV)- 

k=l 



-7?Af(Di-D2) 

^ 

.{r'j, >i + vN,Y^+^''-' e 'jt^iP)) , (15) 



where D2 = -D(T^||(5*|^'), and where for the second 
inequality we again used E Lemma 2.6, p. 32]. 



The set TJ^. (P) corresponds to all output sequences that, 
together with the preamble, have joint type equal to P{-)W{-\-). 



Now, assuming that a > riD2, one can show that 

A/4 

5;P.(rJV > A:+r?iV,y,^+''^-^ G O^f (P)) = n{Ae-^^^ 

k=l 

using the union bound. Therefore, under the above 
assumption we get from ( fTSl ) the desired claim that 

P(£i, £4) = l^(e~''^^^poly(iV)) . (16) 

From (lUl, ©, and (fT6l ). we conclude that if q > r]D2 
then 

E((rJV - z^)+) > 0(Ae-^^^^poly(iV)) . 

Therefore, letting A = e^", we deduce that, if, in 
addition to the inequality a > r]D2, we also have 
a > rjDi, the average reaction delay K{{t'j^—v)'^) grows 
exponentially with A^. ■ 

Concluding Remarks 

Synchronization and information transmission of vir- 
tually all practical communication systems are performed 
separately, on the basis of different communication bits. 
Moreover, in general, the rate of these strategies is com- 
puted with respect to the information transmission time 
period, ignoring the delay overhead caused by various 
hand-shake protocols used to guarantee synchronization. 
In these cases, the notions of 'high rate' or 'capacity- 
achieving' communication strategies clearly raises ques- 
tions. 

Building on an extension of Shannon's original point- 
to-point synchronous communication channel model to 
assess the overall rate performance of asynchronous 
communication systems, we showed that training-based 
schemes perform suboptimally at high rates. In this 
regime, it is necessary to envision communication 
strategies that integrate synchronization into information 
transmission. 
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